POLI 572B

Michael Weaver

February 9, 2024

Design Based Inference

Design-Based Inference

  • Contrast to conditioning
  • Difference-in-Differences
  • Natural Experiments

Design vs Conditioning

Design vs. Model-based inferences

A matter of degree

Statistical evidence for causality combines observed data with a mathematical model of the world

  • mathematical model describes how observed data are generated
  • model always involves assumptions
  • can almost always apply the model, even if it is wrong

Design vs. Model-based inferences

Causal evidence varies in terms of complexity of math/assumptions: a matter of degree

  • Model-based inferences about causality depend on complex statistical models with many assumptions

  • Design-based inferences about causality use carefully controlled comparisons with simple, transparent models and assumptions

Design vs. Model-based inferences

Whatever our approach…


do the assumptions needed to use this mathematical tool reasonably fit reality?

Conditioning vs. Design

When we condition, we block specific backdoor paths that generate confounding by comparing cases that are similar on observable trait. We assume:

  1. Ignorability/Conditional Independence
  2. Positivity/Common Support
  3. No measurement error
  • Explain what these mean:
  • Can we test or prove these assumptions?

Conditioning vs. Design

Conditioning and model dependence:

  • we did exact matching last week: no model dependence, but curse of dimensionality
  • with finite cases, lots of variables, we use a mathematical model to plug-in/impute missing counterfactuals

These models bring additional modelling assumptions:

  • linearity, additivity, etc.

On top of which: we always need to argue that we have blocked all backdoor paths

Conditioning vs. Design

Instead of relying conditioning, we might choose more careful research designs:

  • comparing the same unit over time
  • differences-in-differences
  • “natural experiments”

Here, the structure of the comparison motivates an argument for independence of cause and potential outcomes. Rather than block specific confounding variables, we eliminate confounding due to a class of confounding variables.

Design

Media Effects on Political Attitudes

Social and political theorists have frequently argued that media—by shaping perceptions of events in the world, exposing people to narrative frames—affects beliefs and behaviors.

  • What might be some obstacles to evaluating the causal effect of media exposure?
  • Why might experiments be a poor strategy?
  • Why might conditioning not work very well?

[board]

Media Effects

Foos and Bischoff (2022) examine the effect of changing exposure to The Sun on anti-EU attitudes and voting in the UK.

  • The Sun is a tabloid with long-standing anti-EU coverage and editorial slant
  • They ask: did exposure to the Sun’s coverage change attitudes about the EU?

Conditioning?

We could simply compare attitudes about the EU in areas with greater and less readership of The Sun (or among people who read The Sun versus those that do not):

  • What variables would you want to condition on? (Let’s draw a DAG)
  • What might be some reasons to doubt the effectiveness of this approach?

Alternatives:

In 1989, ~100 fans of Liverpool FC died in stampede at a match:

  • The Sun blamed the victims for the tragedy
  • Fans and teams in the Liverpool area boycotted the The Sun.

Is there any way to make use of this event to learn about the effect of reading The Sun?

Interrupted Time Series

We could compare attitudes toward the EU in Liverpool before and after the boycott: this sometimes called an interrupted time series

  • Compare this to conditioning: how is it different? what possible advantages/disadvantages are there?
  • Block all confounding variables that remain constant over time within Liverpool
  • No need to know what these variables are, measure them, or specify a model, etc.

Interrupted Time Series

Plug in the observed outcome before treatment for the counterfactual outcome after treatment: \(t=1\) is post-treatment, \(t=0\) is pre-treatment.

\[\tau_i = \underbrace{[Y_{i,t=1}(1) | D_i = 1]}_{\text{Liverpool post-1989, boycott}} - \color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}}\]

Plugging in:

\[\widehat{\tau_i} = \underbrace{[Y_{i,t=1}(1) | D_i = 1]}_{\text{Liverpool post-1989, boycott}} - \overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}}\]

  • What can go wrong here?

Interrupted Time Series

In order for interrupted time series to work, we must assume:

\[\overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}} = \color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}}\]

That in the absence of the treatment, outcomes of \(Y\) would not have changed from before to after treatment.

  • Why might this be incorrect in general? In the context of the Liverpool Sun boycott?

Interrupted Time Series

Assumptions imply none of the following occurred

  • “history”: bias due to other short-term changes
  • “maturation”: bias due to long-term trends
  • “testing”: bias related to selection into “treatment” (trends lead to policy)
  • “regression”: extreme events lead to treatment, followed by return to the norm
  • “instrumentation”: treatment induces change in measurement
  • How might we avoid these pitfalls?

Interrupted Time Series

SUPER IMPORTANT: If there is some other factor that changes over time and affects \(Y\), it can induce bias …

EVEN IF IT DOES NOT CAUSE \(D\).

The comparison holds the unit constant before and after the event \(\to\) collider bias - generating dependencies between variables that move together over time.

Interrupted Time Series

A good example of how to do this persuasively:

Mummolo 2018

Interrupted Time Series

What kind of causal effect are we estimating when we do before and after comparisons?

\[\begin{split}E[\tau_i | D_i = 1] = {} \frac{1}{n}\sum\limits^{n}_{i=1} & [Y_{i,t=1}(1) | D_i = 1] - \\ & \color{red}{[Y_{i,t=1}(0)|D_i = 1]}\end{split}\]

Is this the average causal effect?

  • It is the average treatment effect on treated

Interrupted Time Series

Before-after comparisons assume no other changes in outcomes over time, but it is almost always true that

\(\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1] \neq 0\)

i.e., counterfactually, in the absence of treatment \(D\), potential outcomes \(Y_i(0)\) are changing over time.

Observed pre-treatment outcomes not a good substitute for post-treatment counterfactual outcomes.

  • Do we have a way to estimate what these counterfactual trends absent treatment might be?

In our example: we don’t know how EU skepticism might have trended in Liverpool absent the boycott. We do know how EU skepticism in the rest of the UK trended absent the boycott.

We don’t know: \(\color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}} - \overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}}\)

We do know: \(\underbrace{[Y_{i,t=1}(0) | D_i = 0]}_{\text{UK post-1989, no boycott}} - \underbrace{[Y_{i,t=0}(0)|D_i = 0]}_{\text{UK pre-1989, no boycott}}\)

Difference-in-Differences:

Differences-in-differences compares changes in the treated cases against changes in untreated cases.

We use the trends in the untreated cases to plug-in for the \(\color{red}{counterfactual}\) trends (absent treatment) in the treated cases

Difference-in-Differences:

If we assume:

\[\{\overbrace{\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1]}^{\text{Treated counterfactual trend}}\} = \\ \{\underbrace{Y_{i,t=1}(0) | D_i = 0] - [Y_{i,t=0}(0)|D_i = 0]}_{\text{Untreated observed trend}}\}\]

Then we can plug-in the \(observed\) untreated group trend for the \(\color{red}{counterfactual}\) treated group trend.

Difference-in-Differences:

This is the parallel trends assumption. It is equivalent to saying there are no time-varying confounding variables that differ between treated and untreated.

If it is true, we can do some simple algebra and find that

\([\tau_i | D_i = 1] = [Y_{i,t=1}(1) | D_i = 1] - \color{red}{[Y_{i,t=1}(0)|D_i = 1]}\)


\(\begin{equation}\begin{split}[\tau_i | D_i = 1] = {} & \{\overbrace{[Y_{i,t=1}(1) | D_i = 1] - [Y_{i,t=0}(0) | D_i = 1]}^{\text{Treated observed trend}}\} - \\ & \{\underbrace{\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1]}_{\text{Treated counterfactual trend}}\}\end{split}\end{equation}\)

\(\begin{equation}\begin{split}[\widehat{\tau_i} | D_i = 1] = {} & \{\overbrace{[Y_{i,t=1}(1) | D_i = 1] - [Y_{i,t=0}(0) | D_i = 1]}^{\text{Treated observed trend}}\} - \\ & \{\underbrace{Y_{i,t=1}(0) | D_i = 0] - [Y_{i,t=0}(0)|D_i = 0]}_{\text{Untreated observed trend}}\}\end{split}\end{equation}\)

And this gives us the name:

  • \(Treated_{post} - Treated_{pre}\) (first difference)
  • \(Untreated_{post} - Untreated_{pre}\) (first difference)
  • \((Treated_{post} - Treated_{pre}) - \\ (Untreated_{post} - Untreated_{pre})\)
  • Difference in differences

Difference-in-differences:

This shows that the Boycott of the Sun reduced Euro Skepticism in Liverpool

Under the parallel trends assumption (untreated cases have the same trends as treated cases in the absence of treatment)

  • This is an unbiased estimate of the Average Treatment on Treated (\(ATT\)):
    • the effect of treatment on the treated cases
    • it is TOTALLY fine if the effect of treatment on untreated cases would have been different

Difference-in-differences:

If parallel trends assumption holds, what kinds of confounding does this design eliminate?

  • Any time-invariant confounders within units (unchanging causes of \(Y\))
  • Any time-varying confounders that are shared by both treated cases and un-treated cases (causes of \(Y\) that change similarly in both groups)
  • Crucial to choose right “control” cases to have parallel trends or e.g. share as many time-varying confounders.

What are examples of confounders held constant in Sun Boycott difference-in-differences?

Difference-in-differences:

In the newspaper example: what would be an example in which some other factor generates bias that is not solved by the difference-in-differences estimate?

Difference-in-differences:

Estimation:

  • Classical scenario: with two “treatment” conditions, two time periods: calculate differences-in-differences in means
  • We can use regression, dummy for ever-treated, dummy for post-treatment, interaction of the two

\(Y_{it} = \beta_0 + \beta_1 \text{Treated}_i + \beta_2 \text{Post}_t + \beta_3 \text{Treated}_i \times \text{Post}_t\)

\(Y_{it} = \overbrace{\alpha_i}^{\text{dummies for each } i} + \underbrace{\alpha_t}_{\text{dummies for each } t} + \beta_3 \text{Treated}_i \times \text{Post}_t\)

Difference-in-differences:

How do we validate the parallel trends assumption?

  • We cannot directly test it, but we can try to see if it is plausible

Placebo Tests:

  • If “treatment” happens between time \(t\) and \(t+1\), we should not see an effect between \(t-1\) and \(t\)!
  • If design is right, there should be no effects of the treatment prior to the treatment taking place.
  • If we “pass” placebo test, does not reject possibility that something other than “treatment” caused a change.
  • You need multiple pre-treatment data points

Difference-in-differences: Extensions

Staggered Treatment: Bacon-Goodman 2021

  • some provinces pass laws, but some pass it later than others

Multiple Treatments: https://arxiv.org/pdf/1803.08807.pdf

  • units can enter and exit treatment - or there are different treatments they get

Continuous Treatment: https://psantanna.com/files/Callaway_Goodman-Bacon_SantAnna_2021.pdf

Covariates: even conditioning on time-varying confounders has risks

As we get away from simple DiD, assumptions and potential problems multiply, solutions get more complicated…

An Example:

What was the effect of enlistment in the US Civil War on voting for the Republican party?

\(GOP_{ie} = \alpha_i + \alpha_e + \beta Enlist_i \times PostWar_e + \epsilon_y + \epsilon_i\)

  • \(i\) is a county
  • \(e\) is a state-election (election year in a specific state)
  • \(y\) is a year between 1854 and 1880
  • \(PostWar\) indicates if election is after 1861 \((1)\) or before \((0)\)

An Example:

An Example:

Difference-in-differences:

Assumptions:

  • parallel trends: treated and untreated have same changes over time except for shift in treatment.
  • equivalently: no time-varying confounding; no factors affecting \(Y\) that change with \(D\)

Caveats:

  • ‘check’ parallel trends
  • careful: what happens when units are treated at different times? different amounts?
  • careful: STANDARD ERRORS! Clustering may not save you with small \(N\).

Natural Experiments

Natural Experiments:

Address confounding in a different way:

  • exploit random or as-if random allocation of \(X\) to find causal effect
  • look for “naturally” occurring randomization

Natural Experiments:

Distinguishing “natural experiment” from experiments:

  • Treatment and control groups - or different levels of treatment
  • Random assignment (e.g., government program with lottery)
  • “as-if” random assigment (we claim process is random)
  • we don’t do the manipulation

Natural Experiments:

An observational study where causal inference comes from the design that draws on randomization.

  • contrast to difference-in-differences
  • depends on argument to validate “as-if” random assumption \(\to\) fewer assumptions in the statistical model

Regression Discontinuity

  • “Treatment” occurs for cases on one side of threshold, not to cases on other side
  • “Near” the threshold, cases are effectively randomly assigned.
    • all units treated when above the threshold (sharp RD)
    • treatment can increase at the threshold (“fuzzy” RD) (e.g. Mo 2018)
  • Close elections is a common use:

Regression Discontinuity

Two approaches:

  • Potential outcomes of \(Y(0)\) are approximately linear near the threshold; treatment shifts outcomes
    • so we (flexibly) model slope of \(Y\) on either side of the cut-off
  • Potential outcomes of \(Y\) are random near the cut-off
    • difference in means near the cutoff

Regression Discontinuity

Decisions:

  • model outcome around cutoff
    • bias/variance trade-off
    • how do you model it? linear? quadratic? cubic? quartic? local linear?
    • how do you choose the “bandwidth” around the cutoff to use?
    • typically recommend local linear, automatic choice of ‘optimal bandwidth’
  • difference in means at cutoff
    • how do you choose the “bandwidth” around the cutoff to use?

Regression Discontinuity

Assumptions:

  • assignment to treatment at the threshold is random
  • AND
    • either we have correctly modeled potential outcomes of \(Y\) on either side of cutoff
    • or we have used “bandwidth” around cutoff within which treatment is as-if random assigned (difference in means)

Instrumental Variables

Follows from Wald estimator for non-compliance:

  • Instrumental Variables Least Squares uses variance “explained” in \(D\) by “random” \(Z\)
    • \(D\) is “treatment”, \(Z\) is “random assignment to treatment”
    • easy to understand when \(Z\) is binary
    • when \(Z\) is continuous, it is more complicated (and depends on assumptions of linearity)

For more information on using them in practice: Lal, Lockhart, Xu, and Zu 2023

Instrumental Variables

Problems

  • is variation induced in \(D\) by \(Z\) linear?
  • does \(Z\) actually predict change in \(D\)?
    • “weak instruments problem”: identical to when there are very few compliers in an experiment: inference becomes very hard

Instrumental Variables

Assumptions:

  1. \(Z\) is randomly assigned
  2. Exclusion Restriction
    • no other causal path from \(Z\) to \(Y\), other than \(X\)
  3. Monotonicity
    • increasing \(Z\) uniformly increases or decreases values of \(X\)

Natural Experiments: Choices

  • Sometimes natural experiments only random conditional on some other variables (Nellis, et al.)
    • What is the correct functional form for the conditioning variables?
  • Standard Errors:
    • What is the unit at which treatment is assigned?
    • Cluster/Randomization inference

Natural Experiments: Limitations

  • Effects are “local” or for “compliers”
  • Effects for sub-group with randomization
    • like in non-compliance, we find subset of cases that are randomly induced into treatment/responsive
    • cases may be different from population of interest
  • Causal exposures may be different
    • cause applied at cutoff might be different than elsewhere
    • cause applied at random might be different (e.g. rainfall as instrument for economic growth)

Natural Experiments: What to do

  • Evidence that randomization took place
    • statistical tests of implication
    • qualitative evidence: “causal process observation”
  • Show robustness to choices of analysis